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Several studies have shown that the stress pattern of one’s native language 
is applied to new linguistic stimuli. Regarding the segmentation of artificial 
synthesized speech, this idea has been supported by experiments with 
languages where the stress pattern coincides with word boundaries (i.e. 
English, Finnish and Dutch). In this study, we present data on speech 
segmentation with native Spanish speakers whose stress pattern would mark 
the penultimate syllable of words. Results show that to stress the middle 
syllable of trisyllabic words in an artificial speech stream does not facilitate 
segmentation as would be predicted. Possible explanations of these results 
are explored as related to the interaction of statistical and stress cues in 
speech segmentation. 


Research on speech segmentation has indicated that, when facing the 
problem of segmenting a novel speech stream, be it an unfamiliar or 
artificial language, the individuals use the patterns from their own maternal 
language and apply them to the new stimuli (Cutler et al, 1986; Johnson & 
Jusczyk, 2001; Sanders & Neville, 2000; Vroomen et al. 1998). 

Mainly, these studies try to understand how different cues present in 
the speech signal such as language rhythm, phonotactics, vowel harmony, 
or stress patterns are used by a learner to segment the words composing the 
language. Among these cues, the statistical regularities in speech have been 
widely studied (i.e. Saffian, 2002; Saffian, Aslin & Newport, 1996; Saffian, 
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Newport & Aslin, 1996; Newport & Aslin, 2004; Pena, Bonatti, Nespor & 
Mehler, 2002). These studies have demonstrated that human adults and 
infants can effectively extract on-line regularities over a speech stream by 
the computation of transitional probabilities among the items composing the 
words in the stream. Also, they have established a reliable methodology for 
studying how different acoustic cues interact when a listener faces the 
problem of speech segmentation (Vroomen, Toumainen & de Gelder, 1998) 
using mainly artificial synthesized stimuli. 

Regarding the interactions between different cues during the 
segmentation of speech by adults, Saffian, Newport & Aslin (1996) found 
that the lengthening of the final syllable improved performance in a 
segmentation task. This facilitatory cue was thought to be available to 
listeners across many linguistic environments, since, in natural utterances, 
the final syllable tends to be lengthened with respect to the others. Also, in 
Experiment 3 of their study, Vroomen et al. (1998) showed that native 
speakers of Finnish and Dutch performed better in a segmentation task 
when initial stress was added to the words composing an artificial speech 
stream. Nevertheless, native speakers of French did not benefit from this 
modification. The authors’ explanation of these results pointed to 
differences in the stress pattern of Finnish, Dutch, and French. Finnish and 
French are languages with fixed lexical stress. Multi-syllabic content words 
in Finnish always have stress on the initial syllable, while in French stress is 
on the final syllable. Additionally, in Dutch, most words have stress on the 
initial syllable, although there are a few variations (Vroomen et al. 1998). 
So, the authors concluded that a listener’s performance in a segmentation 
task improved when the phonological cues in the stream matched those of 
their maternal language (pp. 145). However, the results of French 
participants could also be explained by their difficulties in perceiving 
lexical stress, as shown in the studies of Dupoux and collaborators 
(Dupoux, Christophe, Sebastian-Galles & Mehler, 1997; Dupoux, 
Peperkamp & Sebastian-Galles, 2001). It is not surprising then that French 
speakers did not take advantage of the stressed cue. 

Results obtained with infants also suggest an early sensitivity to the 
predominant stress pattern in speech segmentation. Seven-and-a-half 
month-old English learners can segment strong-weak, but not weak-strong, 
bisyllabic words out of a stream, which coincides with the most common 
pattern in English. It is not until 10.5 months of age that infants can 
integrate other stress patterns, such as weak-strong (Jusczyk, Houston & 
Newsome, 1999). Regarding the use of conflicting cues in speech 
segmentation, Johnson & Jusczyk (2001), using speech streams composed 
by the same trysillabic words as in the Saffian, Aslin & Newport (1996) 
experiments, observed that 8-month-old infants raised in an English- 
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speaking environment relied more heavily on stress cues than on statistical 
cues when the stress was put on the initial syllable of the word. So the 
notion that language specific stress patterns were applied to new linguistic 
stimuli was supported. Finally, Thiessen & Saffran, (2003) demonstrated 
that by 9 months of age, English-learning infants still rely more on a 
trochaic stress pattern than on statistical cues when segmenting an artificial 
language, which replicates the results by Johnson & Jusczyk (2001). They 
also showed that this preference seems to develop with age, since by 7 
months of age infants rely more heavily on statistical cues. 

To our knowledge, the research on the interaction of different cues 
when confronting a segmentation task has been mainly done over languages 
with stress primarily on the initial syllable of a word, such as English, 
Finnish and Dutch; or languages that exclusively stress the final syllable, 
such as French. But it is yet unknown what would happen in the case of 
languages without a stress initial pattern, in particular in languages with 
stress medial patterns. That is, languages in which the stress does not 
coincide with one of the boundaries of the word, and whose native speakers 
actively use stress infonnation in lexical search, as it is the case of Spanish 
(e.g. Soto-Faraco, Sebastian-Galles & Cutler, 2001). 

In Spanish all multisyllabic words have one syllable marked for 
primary stress. Penultimate stress is predominant (in 75% to 80% of the 
words, the syllable marked for primary stress is the second-to-last syllable; 
Harris, 1983; Quilis, 1984). This is also the case for trisyllabic words, for 
which Spanish has a stress-medial pattern (Navarro, 1966). The percentage 
of trisyllabic words in Spanish that stress the medial syllable is 73.52% (as 
calculated over the LEXESP database that contains over 5,020,930 word 
items; Sebastian-Galles, Marti, Carreiras & Cueto, 2000). For a Spanish 
speaker then, it is very likely that the preferred stressed syllable in a 
trisyllabic stimulus would mark the middle section of the word, and not one 
of its boundaries. 

If the hypothesis that stress patterns from a participant’s native 
language are applied to novel stimuli is correct, then stressing the medial 
syllable in a speech stream should facilitate the segmentation process in 
Spanish speakers, since Spanish has a stress medial pattern in trisyllabic 
words. To test this hypothesis, an experiment using artificial languages 
with different stress patterns was run. We created three artificial streams 
with identical structure to the ones used by the Saffran, Aslin & Newport 
(1996) to study speech segmentation with statistical infonnation. In these 
three streams, either the first, second or third syllables of every word would 
be stressed, thus yielding different stress patterns. As a baseline, a fourth 
stream with flat stress was also created. Using this stream, we wanted to be 
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sure that participants could reliable segment out the words using only 
statistical cues. 


EXPERIMENT 

METHOD 

Participants. The participants were 85 undergraduate Psychology 
students at the Universitat de Barcelona (n=17 in each condition). They 
were native speakers of Spanish. No hearing deficits or problems in 
language acquisition were reported by any participant. They were given 
course credits for their participation on the study. 


Stimuli and apparatus. Four artificial languages that differed only in 
the stress pattern on their words were used. Languages were formed by 
four trisyllabic nonsense words (hereafter referred to as “words”; tupiro, 
golabu, bidaku, pcidoti ). These four words were concatenated in a stream 
following the same order as in the above mentioned study, thus there were 
not any immediate repetitions of words. Statistical regularities were 
controlled, so that the probability for a given syllable to be followed by 
another given one within a word 1.0, while it was 0.33 between words. The 
sequence of words was synthesized using the text-to-speech MBROLA 
software (Dutoit et al. 1996) with a Spanish male diphone database 1 at 16 
kHz. The most important feature of the speech stream was that there were 
no acoustic markers between words, so the only cues available to segment 
out the words were the statistical ones (flat stress language). 

To create the stressed languages, one syllable of each word was 
modified. This modification was done by augmenting a given syllable’s 
pitch by 20 Hz. To create the stress-initial, the stress-medial language, and 
the stress-final language, the first syllable, the second syllable, or the third 
syllable of each word was changed. To control for any undesirable effect of 
rhythm by introducing stress changes every three syllables, a fifth language 
was constructed using the same syllables as in the previous one. But this 
time they were arranged in a completely random order, so there would not 
be any reliable statistic cue to group syllables, just that every three 
syllables, one was stressed. This language was synthesized using the same 
procedure, and lasted the same as all the other languages. So, in total, we 
created three stressed languages (stress-initial, -medial, and -final), one 
language with flat stress, and one language with a random organization of 
syllables. 


1 available at http://tcts.fpms.ac.be/synthesis/mbrola.html 
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To assess the segmentation of the stream, eight test items were 
created using the same synthesis procedure used in the languages. There 
were four words and four part-words. None of them was stressed. Words 
were the same four items that composed the stream. Part-words were made 
by the concatenation of one syllable of a word and two syllables of another 
one, and were tibida, kupado, rogola, butupi. 


Procedure. The experiment was run individually in a sound- 
attenuated booth, on a PC computer using the EXPE programming language 
(Pallier, Dupoux, and Jeannin, 1997). Stimuli were played throughout 
Sennheiser (HMD224) headphones connected to the computer, via a 
Proaudio Spectrum 16 soundcard. Each participant was exposed to one of 
the languages. After 7 minutes of listening to the language, they were given 
a 2 alternative forced choice (2 AFC) test. They were presented with word - 
part-word pairs, with an interval of 500 ms between each item, and had to 
answer by pressing either a “1” or “2” key in a response box indicating 
whether the first or the second trisyllabic groupings was more likely to be a 
word in the language. The next test pair was presented either when the 
participant pressed a response key, or when 5 seconds had elapsed after the 
presentation of the previous test pair. There were eight test trials, each one 
consisting on the presentation of a test pair in which the order of word - 
part-word was balanced. 

RESULTS 

Results for the five conditions are shown in Figure 1. An ANOVA 
with condition as the between- subjects variable revealed significant 
differences in percentage of correct responses (F( 4, 80)=10.851, /XO.OOl). 
Post-hoc analysis showed two homogeneous subsets with no differences 
within each subset. The first one was composed of the stress-flat, stress- 
initial, and stress-final conditions. Percentage of correct responses for 
subjects presented with the stress-flat language was 70.59% (SD= 18.19). 
This percentage was significantly different from chance (t(16)= 4.667, 
/K0.001), demonstrating that subjects could segment the words from the 
speech stream using statistical cues alone. This result resembles previously 
reported ones for adult subjects with part-word foils (65% in Saffian, 
Newport & Aslin, 1996), suggesting a strong effect for segmentation with 
statistical cues alone. For subjects presented with the stress-initial language 
the percentage of correct responses was 71.32% (SD= 16.39), which also 
differed from chance (t(16)= 5.362, /;<().()() 1 ). Finally, for subjects 
presented with the stress-final language, percentage of correct responses 
was 76.47% (577=16.46), and differed from chance too (t(16)= 6.628, 
/KO.OOl). Comparisons in responses for stress-flat and stress-initial 
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languages with the stress-final language yielded non-significant results (for 
all comparisons, t<l). So, stressing the first or the last syllable in the words 
composing the stream did not have any effect in the segmentation results by 
Spanish speakers, when compared with the results from the flat-stress 
language. 

The second subset was composed by the stress medial plus the 
random condition. Percentage of correct responses for subjects presented 
with the random language was 49.26% (SD= 18.99). This result did not 
differ from chance (?(16)= -0.160, /?=0.875). So, as expected, when the 
stream contained no reliable statistical cue to group syllables, participants 
were at chance during the test. This result implies first, that there are no a 
priori preferences for either type of test items (words or part-words) in the 
present experiments, and second, that introducing a stress change every 
three syllables has no grouping effect by itself during the segmentation of 
the stream. Concerning the results of the stress-medial condition, subjects 
had a percentage of correct responses of 41.17% (£D=25.68). This result 
was also not significantly different from chance (t(16)= -1.417, p=0. 176). 
Not only the accentuation of the middle syllable in the words did not help 
the subjects to obtain higher scores, but with this stress pattern the subjects’ 
perfonnance dropped to chance levels. 

Taken together, these results point in the opposite direction as that 
originally predicted. When stress fell in the middle syllable of trisyllabic 
words it did not facilitate the segmentation of the speech stream by Spanish 
listeners. Even more striking is the fact that performance dropped to chance 
levels for the stress-medial language, suggesting a possible conflict between 
statistical and stress cues. While statistical cues would mark the boundary 
of a word at the first and third syllables, stress would make more salient the 
second one, possibly drawing attention and processing resources towards it. 
Results suggest that this conflict was not present in stress-initial and stress- 
final languages, since performance with them was not significantly different 
from performance with the stress-flat language. 

GENERAL DISCUSSION 

Contrary to what had been hypothesized, stressing the middle syllable 
of words composing an artificial speech stream did not help Spanish 
speaking participants on a segmentation task. Even more, it made 
perfonnance drop to chance levels. Also, stressing the initial syllable did 
not improve performance either, as it did in studies whose participants 
native language was mostly stressed on the initial syllable (e.g. Vroomen et 
al. 1998). For Spanish speakers, the results for the stream without stress, 
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and the stream with stress on either the initial or the final syllable, were the 
same. 



Initial Medial Final 

Figure 1. Mean percentage (and standard errors) of correct responses 
for the different experimental conditions in the present experiment. 


Does this mean that listeners do not apply the stress pattern of their 
native language to new speech stimuli? When stress marks one of the 
boundaries of words, as in the case of stress initial languages (English, 
Finnish) or stress final languages (French), it helps the listener to segment 
out the words. But when stress falls on the middle syllable of trisyllabic 
words, it seems to conflict with statistical information that marks the 
beginning and ending of words. Maybe, stressing the initial syllable makes 
the listener to pay more attention to the beginning of the word. In this way 
it would not be a conflicting cue with respect to statistical infonnation for 
Spanish speakers, and therefore their performance would be equivalent to 
that with flat stress speech streams, as was observed in this study. As for 
the effect of stressing the middle syllable, it is likely that it brings the 
attention of the listener to a syllable that does not mark a limit of the word, 
and this is reflected in the test with chance levels of performance. 

Unfortunately, there are not extensive studies that describe the 
mechanisms involved in the segmentation of continuous speech by Spanish 
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speakers. There are, however, indications that stress cues are indeed taken 
into account during segmentation in Spanish (Sebastian-Galles & Costa, 
1997), and that they play an important role in the activation of lexical 
entries (Soto-Faraco, Sebastian-Galles, & Cutler, 2001). Recent models of 
speech segmentation propose that there are different cues contributing to 
speech segmentation (from lexical to suprasegmental; Mattys, White, and 
Melhom, 2005), and that the weight, or relative importance, of them may 
change depending of specific properties of different languages. So, for 
example, stress and syllabic information may contribute differently to the 
segmentation of speech in languages such as Spanish and English. While 
stress would have more importance in English, the syllable would have it in 
Spanish. It is therefore an open question the extent to which the present 
results would generalize to other languages, as importance of stress changes 
may vary across them. 

As mentioned earlier, in Spanish around 75% of the words the 
second-to-last is the primary stressed syllable. One could suggest that this 
percentage is not enough for listeners to actually apply this pattern to new 
artificial stimuli. Nevertheless, in English, around 63% of the words have 
primary stress in the first syllable (taking also into account monosyllabic 
words; Cutler, 1990), and it has been shown that speakers of English apply 
their native language stress pattern in segmentation experiments (e.g. 
Sanders & Neville, 2000; Vroomen et al. 1998). This percentage of words 
in Spanish should be sufficient to create a stress pattern that native speakers 
would apply to new stimuli, such as the one used in the present study. 
Anyway, variability in the position of stress in Spanish, would not predict 
that stressing the middle syllable of words in an artificial speech stream 
would bring segmentation perfonnance to chance levels. 

It is thus possible that salience of certain statistically-coherent 
groupings is increased by subtle phonetic modifications, as the ones used in 
this study. To stress the middle syllable may bring the listener to extract 
regularities that may not necessarily coincide with the ones for segmenting 
the trisyllabic words out of the stream. Thus, the effect of stress found in 
previous studies using English speakers would likely reflect an “edge 
effect”. In languages that stress the first syllable of words, such as English, 
the perceptual properties of the stimuli would coincide with the linguistic 
ones, so both perceptual salience and lexical stress would mark the 
beginning of words. In this way, both language-specific and general 
perceptual infonnation would coincide, allowing for easier word 
segmentation. In any case, this is an interesting possibility that further 
research with other languages should investigate. As for the results 
presented here, they suggest that, under certain conditions, perceptual 
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salience may override language-specific stress patterns during a word 
segmentation task. 


RESUMEN 

Posicion del acento y segmentaeion de palabras en hablantcs de 
Espanol. Varios estudios han mostrado que el patron acentual de la lengua 
materna se aplica a estlmulos linguisticos novedosos. En el campo de la 
segmentaeion del habla sintetizada, esta idea ha recibido apoyo de 
experimentos con lenguajes en los cuales el patron acentual coincide con los 
limites de las palabras (p.e. Ingles, Finlandes y Holandes). En este estudio, 
presentamos datos sobre la segmentaeion del habla en hablantes de Espanol, 
cuyo patron acentual tiende a marcar la penultima silaba de las palabras. 
Los resultados muestran que acentuar la silaba del medio en las palabras 
trisilabicas de un flujo de habla artificial no facilita su segmentaeion, tal y 
como se podria predecir. Se exploran pues las posibles explicaciones de 
estos resultados, en la medida en que se relacionan con la interaccion de 
claves estadisticas y acentuales durante la segmentaeion del habla. 
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